
<!DOCTYPE html
  PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
   <head>
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
   
      <title>18.5.&nbsp;优化列表操作</title>
      <link rel="stylesheet" href="../diveintopython.css" type="text/css">
      <link rev="made" href="mailto:f8dy@diveintopython.org">
      <meta name="generator" content="DocBook XSL Stylesheets V1.52.2">
      <meta name="keywords" content="Python, Dive Into Python, tutorial, object-oriented, programming, documentation, book, free">
      <meta name="description" content="Python from novice to pro">
      <link rel="home" href="../toc/index.html" title="Dive Into Python">
      <link rel="up" href="index.html" title="第&nbsp;18&nbsp;章&nbsp;性能优化">
      <link rel="previous" href="dictionary_lookups.html" title="18.4.&nbsp;优化字典查找">
      <link rel="next" href="string_manipulation.html" title="18.6.&nbsp;优化字符串操作">
   </head>
   <body>
      <table id="Header" width="100%" border="0" cellpadding="0" cellspacing="0" summary="">
         <tr>
            <td id="breadcrumb" colspan="5" align="left" valign="top">导航：<a href="../index.html">起始页</a>&nbsp;&gt;&nbsp;<a href="../toc/index.html">Dive Into Python</a>&nbsp;&gt;&nbsp;<a href="index.html">性能优化</a>&nbsp;&gt;&nbsp;<span class="thispage">优化列表操作</span></td>
            <td id="navigation" align="right" valign="top">&nbsp;&nbsp;&nbsp;<a href="dictionary_lookups.html" title="上一页: “优化字典查找”">&lt;&lt;</a>&nbsp;&nbsp;&nbsp;<a href="string_manipulation.html" title="下一页: “优化字符串操作”">&gt;&gt;</a></td>
         </tr>
         <tr>
            <td colspan="3" id="logocontainer">
               <h1 id="logo"><a href="../index.html" accesskey="1">深入 Python :Dive Into Python 中文版</a></h1>
               <p id="tagline">Python 从新手到专家 [Dip_5.4b_CPyUG_Release]</p>
            </td>
            <td colspan="3" align="right">
               <form id="search" method="GET" action="http://www.google.com/custom">
                  <p><label for="q" accesskey="4">Find:&nbsp;</label><input type="text" id="q" name="q" size="20" maxlength="255" value=""> <input type="submit" value="搜索"><input type="hidden" name="domains" value="woodpecker.org.cn/diveintopython"><input type="hidden" name="sitesearch" value="www.woodpecker.org.cn/diveintopython"></p>
               </form>
            </td>
         </tr>
      </table>
      <!--#include virtual="/inc/ads" -->
      <div class="section" lang="zh_cn">
         <div class="titlepage">
            <div>
               <div>
                  <h2 class="title"><a name="soundex.stage3"></a>18.5.&nbsp;优化列表操作
                  </h2>
               </div>
            </div>
            <div></div>
         </div>
         <div class="abstract">
            <p>Soundex 算法的第三步是去除连续重复字符。怎样做是最佳方法？</p>
         </div>
         <p>这里是我们目前在 <tt class="filename">soundex/stage2/soundex2c.py</tt> 中的代码：
         </p>
         <div class="informalexample"><pre class="programlisting">
    digits2 = digits[0]
    <span class='pykeyword'>for</span> d <span class='pykeyword'>in</span> digits[1:]:
        <span class='pykeyword'>if</span> digits2[-1] != d:
            digits2 += d
</pre></div>
         <p>这里是 <tt class="filename">soundex2c.py</tt> 的性能表现：
         </p>
         <div class="informalexample"><pre class="screen">
<tt class="prompt">C:\samples\soundex\stage2&gt;</tt><span class="userinput">python soundex2c.py</span>
<span class="computeroutput">Woo             W000 11.437645008
Pilgrim         P426 13.2825062962
Flingjingwaller F452 18.5570110168</span>
</pre></div>
         <p>第一件事是考虑，考察在循环的每一轮都检查 <tt class="varname">digits[-1]</tt> 是否有效率。列表索引代价大吗？如果把上一个数字存在另外的变量中以便检查是否会获益？
         </p>
         <p>这里的 <tt class="filename">soundex/stage3/soundex3a.py</tt> 将回答这个问题：
         </p>
         <div class="informalexample"><pre class="programlisting">
    digits2 = <span class='pystring'>''</span>
    last_digit = <span class='pystring'>''</span>
    <span class='pykeyword'>for</span> d <span class='pykeyword'>in</span> digits:
        <span class='pykeyword'>if</span> d != last_digit:
            digits2 += d
            last_digit = d
</pre></div>
         <p><tt class="filename">soundex3a.py</tt> 并不比 <tt class="filename">soundex2c.py</tt> 运行得快多少，而且甚至可能更会慢些 (差异还没有大到可以确信这一点)：
         </p>
         <div class="informalexample"><pre class="screen">
<tt class="prompt">C:\samples\soundex\stage3&gt;</tt><span class="userinput">python soundex3a.py</span>
<span class="computeroutput">Woo             W000 11.5346048171
Pilgrim         P426 13.3950636184
Flingjingwaller F452 18.6108927252</span>
</pre></div>
         <p>为什么 <tt class="filename">soundex3a.py</tt> 不更快呢？其实 <span class="application">Python</span> 的索引功能恰恰很有效。重复使用 <tt class="varname">digits2[-1]</tt> 根本没什么问题。另一方面，手工保留上一个数字意味着我们每存储一个数字都要为<span class="emphasis"><em>两个</em></span> 变量赋值，这便抹杀了我们避开索引查找所带来的微小好处。
         </p>
         <p>让我们从本质上不同的方法来思考。如果可以把字符串当作字符列表来对待，那么使用列表遍历遍寻列表便成为可能。问题是代码需要使用列表中的上一个字符，而且使用列表遍历做到这一点并不容易。</p>
         <p>但是，使用内建的 <tt class="function">range()</tt> 函数创建一个索引数字构成的列表是可以的。使用这些索引数字一步步搜索列表并拿出与前面不同的字符。这样将使你得到一个字符串列表，使用字符串方法 <tt class="methodname">join()</tt> 便可重建字符串。
         </p>
         <p>这便是 <tt class="filename">soundex/stage3/soundex3b.py</tt>：
         </p>
         <div class="informalexample"><pre class="programlisting">
    digits2 = <span class='pystring'>""</span>.join([digits[i] <span class='pykeyword'>for</span> i <span class='pykeyword'>in</span> range(len(digits))
                       <span class='pykeyword'>if</span> i == 0 <span class='pykeyword'>or</span> digits[i-1] != digits[i]])
</pre></div>
         <p>这样快了吗？一个字，否。</p>
         <div class="informalexample"><pre class="screen">
<tt class="prompt">C:\samples\soundex\stage3&gt;</tt><span class="userinput">python soundex3b.py</span>
<span class="computeroutput">Woo             W000 14.2245271396
Pilgrim         P426 17.8337165757
Flingjingwaller F452 25.9954005327</span>
</pre></div>
         <p>有可能因为目前的这些方法都是 “<span class="quote">字符串中心化</span>” 的。<span class="application">Python</span> 可以通过一个命令把一个字符串转化为一个字符列表：<tt class="function">list('abc')</tt> 返回 <tt class="literal">['a', 'b', 'c']</tt>。更进一步，列表可以被很快地<span class="emphasis"><em>就地</em></span> 改变。与其一砖一瓦地建造一个新的列表 (或者字符串)，为什么不选择操作列表的元素呢？
         </p>
         <p>这便是 <tt class="filename">soundex/stage3/soundex3c.py</tt>，就地修改列表去除连续重复元素：
         </p>
         <div class="informalexample"><pre class="programlisting">
    digits = list(source[0].upper() + source[1:].translate(charToSoundex))
    i=0
    <span class='pykeyword'>for</span> item <span class='pykeyword'>in</span> digits:
        <span class='pykeyword'>if</span> item==digits[i]: <span class='pykeyword'>continue</span>
        i+=1
        digits[i]=item
    <span class='pykeyword'>del</span> digits[i+1:]
    digits2 = <span class='pystring'>""</span>.join(digits)
</pre></div>
         <p>这比 <tt class="filename">soundex3a.py</tt> 或 <tt class="filename">soundex3b.py</tt> 快吗？不，实际上这是目前最慢的一种方法<sup>[<a name="d0e39824" href="#ftn.d0e39824">16</a>]</sup>：
         </p>
         <div class="informalexample"><pre class="screen">
<tt class="prompt">C:\samples\soundex\stage3&gt;</tt><span class="userinput">python soundex3c.py</span>
<span class="computeroutput">Woo             W000 14.1662554878
Pilgrim         P426 16.0397885765
Flingjingwaller F452 22.1789341942</span>
</pre></div>
         <p>我们在这儿除了试用了几种 “<span class="quote">聪明</span>” 的技术，根本没有什么进步。到目前为止最快的方法就是最直接的原始方法 (<tt class="filename">soundex2c.py</tt>)。有时候聪明未必有回报。
         </p>
         <div class="example"><a name="d0e39852"></a><h3 class="title">例&nbsp;18.5.&nbsp;目前的最佳结果：<tt class="filename">soundex/stage2/soundex2c.py</tt></h3><pre class="programlisting"><span class='pykeyword'>
import</span> string, re

allChar = string.uppercase + string.lowercase
charToSoundex = string.maketrans(allChar, <span class='pystring'>"91239129922455912623919292"</span> * 2)

<span class='pykeyword'>def</span><span class='pyclass'> soundex</span>(source):
    <span class='pykeyword'>if</span> (<span class='pykeyword'>not</span> source) <span class='pykeyword'>or</span> (<span class='pykeyword'>not</span> source.isalpha()):
        <span class='pykeyword'>return</span> <span class='pystring'>"0000"</span>
    digits = source[0].upper() + source[1:].translate(charToSoundex)
    digits2 = digits[0]
    <span class='pykeyword'>for</span> d <span class='pykeyword'>in</span> digits[1:]:
        <span class='pykeyword'>if</span> digits2[-1] != d:
            digits2 += d
    digits3 = re.sub(<span class='pystring'>'9'</span>, <span class='pystring'>''</span>, digits2)
    <span class='pykeyword'>while</span> len(digits3) &lt; 4:
        digits3 += <span class='pystring'>"0"</span>
    <span class='pykeyword'>return</span> digits3[:4]

<span class='pykeyword'>if</span> __name__ == <span class='pystring'>'__main__'</span>:
    <span class='pykeyword'>from</span> timeit <span class='pykeyword'>import</span> Timer
    names = (<span class='pystring'>'Woo'</span>, <span class='pystring'>'Pilgrim'</span>, <span class='pystring'>'Flingjingwaller'</span>)
    <span class='pykeyword'>for</span> name <span class='pykeyword'>in</span> names:
        statement = <span class='pystring'>"soundex('%s')"</span> % name
        t = Timer(statement, <span class='pystring'>"from __main__ import soundex"</span>)
        <span class='pykeyword'>print</span> name.ljust(15), soundex(name), min(t.repeat())
</pre></div>
         <div class="footnotes">
            <h3 class="footnotetitle">Footnotes</h3>
            <div class="footnote">
               <p><sup>[<a name="ftn.d0e39824" href="#d0e39824">16</a>] </sup><tt class="filename">soundex3c.py</tt> 比 <tt class="filename">soundex3b.py</tt> 快。――译注
               </p>
            </div>
         </div>
      </div>
      <table class="Footer" width="100%" border="0" cellpadding="0" cellspacing="0" summary="">
         <tr>
            <td width="35%" align="left"><br><a class="NavigationArrow" href="dictionary_lookups.html">&lt;&lt;&nbsp;优化字典查找</a></td>
            <td width="30%" align="center"><br>&nbsp;<span class="divider">|</span>&nbsp;<a href="index.html#soundex.divein" title="18.1.&nbsp;概览">1</a> <span class="divider">|</span> <a href="timeit.html" title="18.2.&nbsp;使用 timeit 模块">2</a> <span class="divider">|</span> <a href="regular_expressions.html" title="18.3.&nbsp;优化正则表达式">3</a> <span class="divider">|</span> <a href="dictionary_lookups.html" title="18.4.&nbsp;优化字典查找">4</a> <span class="divider">|</span> <span class="thispage">5</span> <span class="divider">|</span> <a href="string_manipulation.html" title="18.6.&nbsp;优化字符串操作">6</a> <span class="divider">|</span> <a href="summary.html" title="18.7.&nbsp;小结">7</a>&nbsp;<span class="divider">|</span>&nbsp;
            </td>
            <td width="35%" align="right"><br><a class="NavigationArrow" href="string_manipulation.html">优化字符串操作&nbsp;&gt;&gt;</a></td>
         </tr>
         <tr>
            <td colspan="3"><br></td>
         </tr>
      </table>
      <div class="Footer">
         <p class="copyright">Copyright © 2000, 2001, 2002, 2003, 2004 <a href="mailto:mark@diveintopython.org">Mark Pilgrim</a></p>
         <p class="copyright">Copyright © 2001, 2002, 2003, 2004, 2005, 2006, 2007 <a href="mailto:python-cn@googlegroups.com">CPyUG (邮件列表)</a></p>
      </div>
   </body>
</html>